Estimating the Empirical Null Distribution of Maxmean Statistics in Gene Set Analysis

نویسندگان

  • Xing Ren
  • Jianmin Wang
  • Song Liu
  • Jeffrey C. Miecznikowski
چکیده

Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

Genome and transcriptome studies using microarray and RNA-seq technologies often involve simultaneous hypothesis testing of thousands of genes or transcripts. A key step determining significant differential expression in such large-scale testing is obtaining the null distribution of the test statistics. We show by examples that the asymptotic null is often inappropriate for many of the χ tests ...

متن کامل

Empirical phi-divergence test statistics for testing simple and composite null hypotheses

The main purpose of this paper is to introduce first a new family of empirical test statistics for testing a simple null hypothesis when the vector of parameters of interest are defined through a specific set of unbiased estimating functions. This family of test statistics is based on a distance between two probability vectors, with the first probability vector obtained by maximizing the empiri...

متن کامل

A New Approximation for the Null Distribution of the Likelihood Ratio Test Statistics for k Outliers in a Normal Sample

Usually when performing a statistical test or estimation procedure, we assume the data are all observations of i.i.d. random variables, often from a normal distribution. Sometimes, however, we notice in a sample one or more observations that stand out from the crowd. These observation(s) are commonly called outlier(s). Outlier tests are more formal procedures which have been developed for detec...

متن کامل

Estimating the null distribution for conditional inference and genome-scale screening

In a novel approach to the multiple testing problem, Efron (2004; 2007) formulated estimators of the distribution of test statistics or nominal p-values under a null distribution suitable for modeling the data of thousands of una ected genes, non-associated single-nucleotide polymorphisms, or other biological features. Estimators of the null distribution can improve not only the empirical Bayes...

متن کامل

Impact of Outliers in Data Envelopment ‎Analysis‎

This paper will examine the relationship between "Data Envelopment Analysis" and a statistical concept ``Outlier". Data envelopment analysis (DEA) is a method for estimating the relative efficiency of decision making units (DMUs) having similar tasks in a production system by multiple inputs to produce multiple ‎outputs.‎ An important issue in statistics is to identify the outliers. In this pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017